ETRA Challenge Report¶
Introduction¶
In ETRA Challenge, 8 subjects were tested on various tasks including visual fixation, visual search and visual exploration. Multiple stimuli were presented, such as, natural images, blank scenes or images from "Where is Waldo" book. We came up with two, which we want to test:
- There will be difference in eye movement trajectories between variants
FixationandFreeViewingin the task typeBlank. - There will be difference in pupil size based on the task and it will also depend on the time elapsed since the start of the experiment.
Data Format¶
The dataset is downloaded into directory data/ which contains:
data/- Directory containing subdirectories with data of each participant stored in CSV files representing each trial. Each file contains 45 seconds of eye tracker data.images/- Contains the pictures shown during the experiment.DataSummary.csv- Each row in the file contains info about the clicks of the given participant during the trial.
Description of the dataset is on the page ETRA dataset description.
Setup¶
Following part sets up the enviroment to properly run the experiment.
# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
# Apply the default theme
sns.set_theme()
%matplotlib inline
# should enable plotting without explicit call .show()
%load_ext pretty_jupyter
# Load the dataset using the prepared module `etra.py`.
from etra import ETRA
dataset = ETRA()
Load Data¶
Following part defines function which easily loads specified data from the dataset based on the subject, fv_fixation and task_type parameters.
from etra import read_data
def load_data(
subject="*",
fv_fixation="*",
task_type="*",
):
"""
Loads selected data and sorts them by the time.
"""
selected_files = (dataset.data_dir / "data").glob(f"{subject}/*_{fv_fixation}_{task_type}_*.csv")
return pd.concat((read_data(f) for f in selected_files)).sort_values(by="Time")
Examples:¶
# Load participant `009`, `Fixation` task and type `Waldo`.
df_009_fix_waldo = load_data(
subject="009",
fv_fixation="Fixation",
task_type="Waldo",
)
df_009_fix_waldo.head()
| participant_id | trial_id | fv_fixation | task_type | stimulus_id | Time | LXpix | LYpix | RXpix | RYpix | LXhref | LYhref | RXhref | RYhref | LP | RP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 009 | 016 | Fixation | Waldo | wal014 | 1614108 | 473.22 | 317.625 | 468.18 | 314.700 | -802.0 | 2960.0 | -858.0 | 2931.0 | 722 | 903 |
| 1 | 009 | 016 | Fixation | Waldo | wal014 | 1614110 | 473.30 | 317.775 | 467.78 | 314.775 | -801.0 | 2962.0 | -862.0 | 2932.0 | 720 | 904 |
| 2 | 009 | 016 | Fixation | Waldo | wal014 | 1614112 | 473.38 | 317.925 | 467.46 | 314.925 | -800.0 | 2963.0 | -866.0 | 2934.0 | 718 | 905 |
| 3 | 009 | 016 | Fixation | Waldo | wal014 | 1614114 | 473.30 | 319.125 | 465.14 | 315.750 | -800.0 | 2977.0 | -889.0 | 2945.0 | 720 | 910 |
| 4 | 009 | 016 | Fixation | Waldo | wal014 | 1614116 | 474.82 | 317.550 | 465.46 | 316.800 | -785.0 | 2958.0 | -886.0 | 2957.0 | 723 | 908 |
# Load `Waldo` data from all the participants and both `Fixation` and `FreeViewing`.
df_waldo = load_data(
task_type="Waldo",
)
df_waldo.head()
| participant_id | trial_id | fv_fixation | task_type | stimulus_id | Time | LXpix | LYpix | RXpix | RYpix | LXhref | LYhref | RXhref | RYhref | LP | RP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 062 | 001 | Fixation | Waldo | wal002 | 262296 | 461.70 | 297.375 | 467.62 | 299.550 | -1418.0 | 3197.0 | -1353.0 | 3218.0 | 744 | 962 |
| 1 | 062 | 001 | Fixation | Waldo | wal002 | 262298 | 462.58 | 298.875 | 469.06 | 300.825 | -1425.0 | 3197.0 | -1354.0 | 3216.0 | 744 | 963 |
| 2 | 062 | 001 | Fixation | Waldo | wal002 | 262300 | 463.22 | 298.650 | 468.98 | 301.125 | -1418.0 | 3194.0 | -1355.0 | 3220.0 | 746 | 961 |
| 3 | 062 | 001 | Fixation | Waldo | wal002 | 262302 | 463.22 | 299.400 | 469.46 | 301.050 | -1418.0 | 3203.0 | -1350.0 | 3219.0 | 747 | 961 |
| 4 | 062 | 001 | Fixation | Waldo | wal002 | 262304 | 462.58 | 299.250 | 469.30 | 302.025 | -1425.0 | 3201.0 | -1352.0 | 3230.0 | 747 | 962 |
Hypotheses¶
Following part describes the hypotheses and its testing.
Hypothesis 1:¶
"There will be difference in eye movement trajectories between variants Fixation and FreeViewing in the task type Blank."¶
During the task Fixation participants must keep their focus on the fixed point of the screen.
On the other hand during FreeViewing task they are able to move their eyes in any direction.
We expect that the trajectory of the eye movement will differ in the puzzle task types (Puzzle and Waldo),
because it would be imposible to solve the puzzle without moving the eyes (for FreeViewing part).
We also expect that the participant would discover the picture in the task type Natural and the trajectories would differ too.
The interesting example could be the task type Blank. In this task there is nothing in the picture just blank picture.
The participant during the FreeViewing task could either stare on the fixed point of the picture which would lead
to same trajectories as in the Fixation tasks or randomly fluctuate their focus on the picture.
This idea is the main motivation to test the hypothesis above.
(Note: The original hypothesis was that the trajectories would differ in the tasks Puzzle and Waldo.
However, the hypothesis was based on the wrong understanding of the task Fixation for the types Puzzle and Waldo
and it does not make much sense to test this hypothesis now (it would almost certaily lead into the difference of the trajectories.
This is the reason why we decided to change to hypothesis to the one described above.)
Data Preparation¶
For this experiment, we restricted only on the data of the task type Blank from the arbitrary choosen participant 009. We have choosen only one arbitrary participant, because of the selected statistical tests which assume independent variables. This condition will not be fulfilled for the multiple participants.
Then we used the following columns (names of the columns are renamed variants of the names from the original table (original names could be found below):
time- Time in milliseconds from the begining of the trial (normalized from the absolute values to relative (starts from 0).fv_fixation- Indicator of eitherFreeViewingorFixationtask.trial- ID of the trialx,y- Coordinates of the left eye focus. In order to simplify the testing we have decided to study only left eye trajectory, because we expect that the trajectories of the left and right eye are similar.
In the following blocks of code we provide example of the described data preparation.
# Extraction of the data from the participant with ID `009` for the `Blank` task type.
hyp1_var = load_data(subject="009", task_type="Blank")
# Display first rows of the extracted data.
hyp1_var.head()
| participant_id | trial_id | fv_fixation | task_type | stimulus_id | Time | LXpix | LYpix | RXpix | RYpix | LXhref | LYhref | RXhref | RYhref | LP | RP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 009 | 004 | FreeViewing | Blank | grey | 524232 | 533.22 | 245.775 | 523.70 | 235.050 | 566.0 | 521.0 | 454.0 | 413.0 | 823 | 964 |
| 1 | 009 | 004 | FreeViewing | Blank | grey | 524234 | 533.94 | 246.300 | 523.38 | 233.925 | 573.0 | 525.0 | 449.0 | 401.0 | 820 | 973 |
| 2 | 009 | 004 | FreeViewing | Blank | grey | 524236 | 531.94 | 247.200 | 521.78 | 233.475 | 553.0 | 538.0 | 432.0 | 398.0 | 823 | 964 |
| 3 | 009 | 004 | FreeViewing | Blank | grey | 524238 | 533.30 | 246.600 | 523.38 | 238.650 | 566.0 | 530.0 | 454.0 | 452.0 | 824 | 977 |
| 4 | 009 | 004 | FreeViewing | Blank | grey | 524240 | 532.02 | 247.575 | 524.34 | 238.725 | 554.0 | 542.0 | 464.0 | 452.0 | 823 | 977 |
# Rename the parameters for the future usage.
hyp1_var = hyp1_var\
.rename(
{
"Time": "time",
"trial_id": "trial",
"fv_fixation" : "fv_fixation",
"LXpix": "x",
"LYpix": "y",
},
axis = 1
)
# Drop the redundant parameters.
hyp1_var = hyp1_var\
[
[
"time",
"trial",
"fv_fixation",
"x",
"y",
]
]
# Time normalization (to start from 0).
hyp1_var["time"] = hyp1_var.groupby(["trial"])["time"].transform(lambda x: x - x.min())
Distance Computation¶
Next step of the pre-processing was to reduce the amount of data in each trial. We divided the data into intervals of 10 following measurements and computed the mean of the coordinates (columns x and y). In most cases it should be 20 ms intervals of the trial (measurements are typicaly separated by 2 ms). The computation of the mean should also filter random fluctuations of the eye movement data caused by inaccurate measurement. Then, we computed the distance between each pair of the consecutive interval means from the trial. This information describes the movement of the eye focus in the specific time interval and we used it to test the first hypothesis. Finally, we remove the abundand columns.
The described preprocessing could be seen in the following code blocks.
def create_interval_means(
data : pd.DataFrame,
x_col="x",
y_col="y",
interval_size=10,
):
"""
Creates new table of means of coordinates for intervals of size `interval_size`.
"""
num_rows = data.shape[0]
datas = []
for start_point in range(0, num_rows, interval_size):
slice_data = data.loc[start_point:min(start_point + interval_size, num_rows - 1), ("x", "y")]
if not slice_data.empty:
interval_means = slice_data.mean()
datas.append(interval_means)
return pd.DataFrame(datas)
def compute_distance(
data : pd.DataFrame,
x_col="x",
y_col="y",
dist_name="distance",
):
"""
Computes distance between each pair of the consecutive coordinates.
"""
# Compute distance between consecutive pairs of points.
data[dist_name] = np.sqrt((data[x_col] - data[x_col].shift())**2 + (data[y_col] - data[y_col].shift())**2)
# Remove the first row, which has a NaN value for distance.
return data.dropna()
# Compute means of the coordinates for each trial in fixed time intervals. Then compute distance between each pair of the consecutive coordinate means.
hyp1_var_avg_x_y = hyp1_var\
.groupby(["trial", "fv_fixation"])\
.apply(create_interval_means)\
.reset_index()\
.groupby(["trial", "fv_fixation"])\
.apply(compute_distance)\
.reset_index(drop=True)\
.groupby(["trial", "fv_fixation"])\
.agg(avg_distance=("distance", np.mean))\
.reset_index()\
.drop("trial", axis=1)
# Show the prepared data for hypothesis testing.
hyp1_var_avg_x_y.head()
| fv_fixation | avg_distance | |
|---|---|---|
| 0 | FreeViewing | 5.826678 |
| 1 | Fixation | 1.375381 |
| 2 | FreeViewing | 7.560105 |
| 3 | Fixation | 1.690413 |
| 4 | FreeViewing | 9.837648 |
Hypothesis testing¶
We have to formally define the null and alternative hypothesis to test. State meansof trajectory differences for FreeViewing $\mu_0$ and Fixation $\mu_1$ for Blank task type.
$H_0$ (null hypothesis): $\mu_0 = \mu_1$
$H_1$ (alternative hypothesis): $\mu_0 \neq \mu_1$
We compute the summary statistics for average traveled distance for both FreeViewing and Fixation tasks.
hyp1_var_avg_x_y.groupby("fv_fixation").describe().round(decimals=2)
| avg_distance | ||||||||
|---|---|---|---|---|---|---|---|---|
| count | mean | std | min | 25% | 50% | 75% | max | |
| fv_fixation | ||||||||
| Fixation | 15.0 | 1.80 | 0.41 | 1.38 | 1.44 | 1.69 | 2.26 | 2.53 |
| FreeViewing | 15.0 | 6.17 | 2.12 | 3.36 | 4.25 | 5.95 | 7.42 | 9.84 |
We have decided to use the paired t-test, which is wildly used to test data points linked to each other. However, this test assumes that we are testing independent random variables (fulfilled by choosing only 1 participant) from the normal distribution with not known variance. Let us then look at the distributions of the avg_distance:
# Plot the distribution of dista
g = sns.displot(
data=hyp1_var_avg_x_y,
x='avg_distance',
hue='fv_fixation',
kind="kde",
)
g.set_axis_labels("Distance (pixels)")
plt.title("Distribution of mean traveled distance in 20 ms interval in trials")
Text(0.5, 1.0, 'Distribution of mean traveled distance in 20 ms interval in trials')
The distributions above looks roughly like normal distribution, so we can use the paired t-test:
# Run t-test:
stats.ttest_rel(
hyp1_var_avg_x_y[(hyp1_var_avg_x_y.fv_fixation == "FreeViewing")].avg_distance,
hyp1_var_avg_x_y[(hyp1_var_avg_x_y.fv_fixation == "Fixation")].avg_distance,
)
Ttest_relResult(statistic=7.553155546087788, pvalue=2.6562601597399225e-06)
Resulting p-value fom the paired t-test is p = 2.656e-06. Using the significance level $\alpha = 0.05$ we reject the null hypothesis $H_0$, meaning that the trajecories of the eye movement differ between FreeViewing and Fixation tasks for Blank task type for participant 009.
Discusion¶
We have rejected the null hypothesis $H_0$. The reason of this result could be that participant during the FreeViewing task randomly moves their focus between several arbitrary points in the blank picture. In contrast in Fixation task they tried to focus on the fixed point. The distribution of the mean distance of the eye focus above also supports this idea. It shows that during FreeViewing task the distribution of the traveled distance had higher variance and its mean was also significantly higher than in Fixation task.
For better understanding of the different behaviour between FreeViewing and Fixation. In the future work we could compare the results of the other task types between each other. These results could be for example used to determine the difference between trajectories of puzzle solving tasks and Natural and Blank tasks or to determine how the focus of the participants differ in Fixation task for diferent task types. The hypothesis testing above could be also improved by filtering errorous movements from the data instead of computing the mean of the intervals.
Hypothesis 2:¶
"There will be difference in pupil size based on the task and it will also depend on the time elapsed since the start of the experiment".¶
The motivation to test this hypothesis is the correlation between pupil size and excitement of the participant. During Blank the participant probably will not be very excited and the average pupil size would be relatively small. It is hard to determine the behaviour during the task type Natural, because pupil size could depend on the specific picture (if it is interesting or not). In the tasks Puzzle and Waldo, we expect that pupil size would depend on the succes of solving the task. Additionaly, we expect that excitement of the participants will decline in time so as the pupil size.
In order to test hypothesis above we decided to compare the average pupil size between the task Blank and other types of tasks.
Data Preparation¶
In order to test the hypothesis we extracted data from only one arbitrary choosen participant with ID 059 (because of usage of t-test). We also extracted only data from FreeViewing task, because Fixation task is not relevant in terms of testing of our hypothesis.
We extracted and renamed the following columns:
time- Time in milliseconds from the begining of the trial (normalized from the absolute values to relative (starts from 0).trial- ID of the trial.task- Type of the task.lp- Size of the left pupil. We arbitrary selected the left pupil size as simplification of the problem, because we expect that the size will change synchronously on the left and on the right.
# We choose only subject `059` and `FreeViewing` task.
hyp2_var = load_data(subject="059", fv_fixation="FreeViewing")
hyp2_var.head()
| participant_id | trial_id | fv_fixation | task_type | stimulus_id | Time | LXpix | LYpix | RXpix | RYpix | LXhref | LYhref | RXhref | RYhref | LP | RP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 059 | 001 | FreeViewing | Natural | nat009 | 952510 | 450.50 | 224.700 | 458.10 | 191.550 | -1601.0 | 4313.0 | -1494.0 | 3939.0 | 2327 | 2158 |
| 1 | 059 | 001 | FreeViewing | Natural | nat009 | 952512 | 450.58 | 223.650 | 457.22 | 197.100 | -1600.0 | 4300.0 | -1508.0 | 4001.0 | 2334 | 2223 |
| 2 | 059 | 001 | FreeViewing | Natural | nat009 | 952514 | 450.66 | 224.325 | 459.78 | 198.525 | -1600.0 | 4308.0 | -1482.0 | 4019.0 | 2344 | 2224 |
| 3 | 059 | 001 | FreeViewing | Natural | nat009 | 952516 | 450.34 | 223.425 | 458.74 | 198.300 | -1602.0 | 4297.0 | -1493.0 | 4016.0 | 2325 | 2246 |
| 4 | 059 | 001 | FreeViewing | Natural | nat009 | 952518 | 449.70 | 223.500 | 457.86 | 199.200 | -1608.0 | 4297.0 | -1502.0 | 4025.0 | 2325 | 2247 |
# Rename the parameters.
hyp2_var = hyp2_var.rename(
{
"Time": "time",
"trial_id": "trial",
"task_type": "task",
"LP": "lp",
},
axis = 1
)
# Get only necessary parameters. Choose only left pupil.
hyp2_var = hyp2_var\
[
[
"time",
"trial",
"task",
"lp",
]
]
# Normalize time to always start from the 0.
hyp2_var["time"] = hyp2_var\
.groupby("trial")["time"]\
.transform(lambda x: x - x.min())
Blinks Removal¶
Because we wanted to analyze the change of the pupil size we had to somehow deal with the blinks of the respondent. Because of that prepared function to remove the blink candidates from the dataset. Firstly, we have to detect the blinks. For this problem we used the lower estimate of the average human pupil size which is 2 mm [1]. To ensure that we will not reduce large number of the correct data from the participant with small pupil size, we detected blink when the pupil size is less than 1 mm (half of the bottom average estimate). We also used the estimation of blink duration from [2], which states that blink duration is approximately $1/3$ s. Based on that estimation we removed 160 ms interval before and after the detected blink peak to remove the data affected by the blink.
This preprocessing could also help with removing erroneous pupil size data. The implementation of the function to remove blinks is in the code blocks below.
def remove_blinks(
data : pd.DataFrame,
pupil_column='lp',
blink_treshold=100,
blink_duration=80, # Remove 160 ms before and after blink.
):
"""
Removes all blink candidates and their adjacent interval from the data.
"""
new_data = data.copy()
num_rows = new_data.shape[0]
# Find all time candidates for the blink (lower than `blink_treshold`).
blink_peaks = np.where(new_data[pupil_column].lt(blink_treshold))[0]
for peak in blink_peaks:
# For each blink candidate get rid of the data around the blink
# (to eliminate whole blinking interval).
new_data\
.loc[
max(0, peak - blink_duration):
min(peak + blink_duration, num_rows),
pupil_column
] = pd.NA
return new_data.dropna()
def plot_blinks(
data: pd.DataFrame,
trial_id,
):
"""
Plots the left pupil size in time for the choosen trial before and after blink removal.
"""
before = data[data.trial == trial_id].assign(blinks_removed='No')
after = remove_blinks(before).assign(blinks_removed='Yes')
combined_data = pd.concat([before, after])
g = sns.relplot(
data=combined_data,
x='time',
y='lp',
kind='line',
row='blinks_removed',
aspect=3
)
g.set_xlabels("Time (ms)")
g.set_ylabels("Pupil size")
g.axes[0, 0].set_title(f"Original pupil size :trial {trial_id}")
g.axes[1, 0].set_title(f"Pupil size with blinks removed :trial {trial_id}")
The first plot below shows the left pupil size in the time for the trial 045. The second plot shows the same data after removing the blink candidates. It is easy to estimate from the plot where the blinks probably happened. When we compare the data before and after pre-processing we can say that the function for removing blinks worked fairly well. We have choosen the trial 045, because it is an example with large number of the blink candidates and nicely ilustrates the function of the pre-processing (but we can also see some blink candidates which were probably erroneously left in the data).
plot_blinks(hyp2_var, "045")
The following code blocks shows the final pre-processing of the data. We removed the blinks for each trial, then computed mean of the left pupil size for each trial and finaly removed trial column.
# Remove blinks and compute mean left pupil size for each trial.
hyp2_var_avg_pupil = hyp2_var\
.groupby("trial")\
.apply(remove_blinks)\
.reset_index(drop=True)\
.groupby(["trial", "task"])\
.agg(avg_lp=("lp", np.mean))\
.reset_index()\
.drop("trial", axis=1)
hyp2_var_avg_pupil.head()
| task | avg_lp | |
|---|---|---|
| 0 | Natural | 1755.973668 |
| 1 | Puzzle | 2098.998261 |
| 2 | Puzzle | 1971.509708 |
| 3 | Blank | 1511.975890 |
| 4 | Puzzle | 1654.727015 |
Hypothesis Testing¶
First, define the null and alternative hypotheses (in fact we want to test 3 variants for each alternative task type). State $\mu_0$ to be mean of left pupil size for Blank task type and $\mu_t$ to be mean left pupil size for one of the other types. Note that $t$ specifies the task type (so we have 3 different variants of $\mu_t$ for each alternative task type.
- $H_0$ (null hypothesis): $\mu_0 = \mu_t$
- $H_1$ (alternative hypothesis): $\mu_0 \neq \mu_t$
We first compute the summary statistics for the mean left pupil size avg_lp for each task type.
hyp2_var_avg_pupil.groupby("task").describe().round(decimals=2)
| avg_lp | ||||||||
|---|---|---|---|---|---|---|---|---|
| count | mean | std | min | 25% | 50% | 75% | max | |
| task | ||||||||
| Blank | 15.0 | 1301.41 | 240.46 | 909.63 | 1153.67 | 1286.39 | 1433.16 | 1832.80 |
| Natural | 15.0 | 1630.13 | 197.86 | 1127.36 | 1544.63 | 1602.10 | 1763.40 | 1937.58 |
| Puzzle | 15.0 | 1756.54 | 199.33 | 1398.00 | 1626.31 | 1755.27 | 1871.95 | 2099.00 |
| Waldo | 15.0 | 1617.98 | 146.14 | 1367.17 | 1504.26 | 1636.79 | 1717.40 | 1891.75 |
Then, we look at the distribution of the mean left pupil size for each task type. We can see that it roughly looks like normal distribution, so the assumption for t-test is probably fulfilled.
The plots of the distributions could be seen in the code blocks below.
# Plot the `avg_lp` for each task.
g = sns.displot(
data=hyp2_var_avg_pupil,
x='avg_lp',
hue='task',
col="task",
kde=True,
)
g.set_xlabels("Average left pupil size")
g.set_ylabels("Count")
<seaborn.axisgrid.FacetGrid at 0x7f6b31f03c50>
The assumptions of the t-test are that the random variables should be independent and that their distribution should be normal. The independence is fulfilled because we have choosen only data from one participant. As we have stated above the assumption of distribution is probably fulfilled based on the ploted results. Though, we are not hundred percent sure about the statement, so we decided to test the results also with Wilcoxon rank sum test. This test is non-parametric analogy of t-test which does not assume anything specific about the distribution.
We performed paired t-test and Wilcoxon rank sum test for each pair of Blank and alternative task type on avg_lp values.
# Perform t-tests and Wilcoxin rank sum tests:
for alt_task in ["Natural", "Puzzle", "Waldo"]:
ttest_result = stats.ttest_rel(
hyp2_var_avg_pupil[hyp2_var_avg_pupil.task == "Blank"].avg_lp,
hyp2_var_avg_pupil[hyp2_var_avg_pupil.task == alt_task].avg_lp,
)
wilcoxon_result = stats.ranksums(hyp2_var_avg_pupil[hyp2_var_avg_pupil.task == "Blank"].avg_lp,
hyp2_var_avg_pupil[hyp2_var_avg_pupil.task == alt_task].avg_lp
)
print(f"Results for the paired t-test for the alternative task type '{alt_task}' are: {ttest_result}")
print(f"Results for the Wilcoxon rank sum test for the alternative task type '{alt_task}' are: {wilcoxon_result}")
print()
Results for the paired t-test for the alternative task type 'Natural' are: Ttest_relResult(statistic=-4.296009019122196, pvalue=0.0007392300421336361) Results for the Wilcoxon rank sum test for the alternative task type 'Natural' are: RanksumsResult(statistic=-3.5048967281888372, pvalue=0.00045678443326107037) Results for the paired t-test for the alternative task type 'Puzzle' are: Ttest_relResult(statistic=-7.094285178256108, pvalue=5.385082398716744e-06) Results for the Wilcoxon rank sum test for the alternative task type 'Puzzle' are: RanksumsResult(statistic=-4.085589677237875, pvalue=4.3965015355347056e-05) Results for the paired t-test for the alternative task type 'Waldo' are: Ttest_relResult(statistic=-4.955580613728317, pvalue=0.00021124301829198586) Results for the Wilcoxon rank sum test for the alternative task type 'Waldo' are: RanksumsResult(statistic=-3.5463747959780543, pvalue=0.0003905702877707525)
Resulting p-values for different alternative task types for paired t-test are:
Natural: 0.000739
Puzzle: 5.385e-06
Waldo: 0.000211
And for the Wilcoxon rank sum test:
Natural: 0.000457
Puzzle: 4.397e-05
Waldo: 0.000391
Using the significance level $\alpha = 0.05$ we reject the null hypothesis $H_0$ for all variants of alternative task types (Natural, Puzzle, Waldo) for both paired t-test and Wilcoxon rank sum test. The mean pupil size $\mu_0$ in Blank task type will differ with all other means $\mu_t$ for alternative taks types in FreeViewing task and participant 059.
Discusion¶
Different Pupil Size in Time¶
The original hypothesis stated that the pupil size will also differ in time. Unfortunately we do not have time to test this part of the hypothesis deeply. At least, we prepared the plots of left pupil size in time for the arbitrary selected trials 032, 037, 038 and 039 of participant 059 with blinks removed from the data. Each trial represent different task type.
Based on the plots we can assume that during the Blank task the participants were the least excited which led to roughly constantly small pupil size in time. Surprisingly the most similar curves of the pupil size in time were for Natural and Waldo task types. Although, we can see some peaks in pupil size in Waldo task, which could be the moment when the participant found the Waldo. For the Puzzle task type we can see that there are significant peaks in the pupil size. This peaks could be the moments when participant found the difference between pictures.
It is important to state that these ideas are based only on the observations of the plots of one representative for each task type, thus could be badly interpreted. The code and plots mentioned above are in the following code blocks.
# Plot pupil size in time for randomly choosen trials for each task type:
trial_types = {
"032": "Puzzle",
"037": "Blank",
"038": "Natural",
"039": "Waldo"
}
# Filter data for the specified trials and remove blinks.
before = hyp2_var[hyp2_var.trial.isin(trial_types.keys())]
after = before.groupby("trial").apply(remove_blinks).reset_index(drop=True)
# Create labels for the plots.
after["task_type"] = after["trial"].map(trial_types).astype(str)
# Rename the trials to "{trial_number} : {task_type}" (for plotting).
after["trial"] = after["trial"] + " : " + after["task_type"]
# Plot the data:
# Each task type in separate plot.
g = sns.relplot(data=after, x="time", y="lp", kind="line", col="trial", hue="trial", col_wrap=2, aspect=2)
g.set_axis_labels("Time (ms)", "Pupil Size")
g.fig.suptitle("Pupil Size Over Time by Trial (separated task types)")
# All task types in one plot.
g = sns.relplot(data=after, x="time", y="lp", kind="line", hue="trial", aspect=3)
g.set_axis_labels("Time (ms)", "Pupil Size")
plt.title("Pupil Size Over Time (all task types together)")
Text(0.5, 1.0, 'Pupil Size Over Time (all task types together)')
Discusion of the Hypothesis Testing¶
We have rejected the null hypothesis $H_0$ for all variants of the task types. The reason of this result could be that participant during Blank task type is mainly bored and its pupil size is then small. This idea is also supported by the observations of pupil size in time which shows that the pupil size in Blank task type was the smallest between the choosen representatives of each task type. On the other hand, we the participants are probably more excited while solving some problem like in Puzzle or Waldo tasks or while looking around the picture (Natural task).
Finally, we have to state that the results were tested only on the data from one participant and the results for the more diverse sample of participants could differ. In future work it might be appropriate to test the results on the whole set of participants and more deeply explore the relationship between each task type and pupil size in time.